background-image: url(https://raw.githubusercontent.com/tidyverse/purrr/master/man/figures/logo.png) background-position: 95% 5% background-size: 7.5% layout: true


##purrr: Functional Programming Tools

purrr facilitates functional programming (FP) with data frame objects (e.g., tibbles) in R. Whenever you would normally refer to a for-loop for solving an iterative problem, the family of map_*() functions allows you to rephrase your problem as a data manipulation pipeline.

Three types of map_*() function: - map(.x, .f, ...) takes the input .x and applies .f to each element in .x. - group_map(.data, .f, ...) takes a grouped tibble and applies .f to each subgroup. - map2(.x, .y, .f, ...) takes the inputs .x and .y and applies .f to .x and .y in parallel. - pmap(.l, .f, ...) takes a list .l of inputs and applies .f to each element in .l in parallel.

.pull-left[ By default map() returns a list. If you want to be more explicit about the output you may refer to - map_lgl() to receive an output type logical, - map_chr() to receive an output type character, - map_int() to receive an output type integer, - map_dbl() to receive an output type double, or - map_df() to receive a data frame output.] .pull-right[ The input .x to any map()_* function can be either a vector, list or data frame. - Vector: Iteration over vector entries - List: Iteration over list elements - Data frame: Iteration over columns]

??? Comments


##purrr: Functional Programming Tools

Use Case: Apply the z-normalization to multiple variables

z_transform <- function(.x) {
  mean <- mean(.x)
  sd <- sd(.x)
  return( (.x - mean) / sd )
}
samples <- list(sample1 = rnorm(10, 75, 22), sample2 = rnorm(10, 52, 11), sample3 = rnorm(10, 99, 33))
## $sample1
##  [1]  75.68782 125.66495  83.25217  90.76536 117.68463  52.62998  56.95321  78.48313
##  [9]  80.83747  78.62980
## 
## $sample2
##  [1] 68.34106 50.80146 54.25619 49.83732 56.72566 50.77191 44.80929 47.04905 54.57193
## [10] 48.12263
## 
## $sample3
##  [1]  90.96179 132.67138  78.97472  90.17809 177.76370 102.07773  73.75073  89.41950
##  [9]  96.61828  48.50498

??? comment


##purrr: Functional Programming Tools

Use Case: Applying the z-normalization to multiple variables

for (s in samples) {
  print(z_transform(s))
}
map(.x = samples, .f = ~z_transform(.x)) #equivalent to map(samples, z_transform)
## $sample1
##  [1] -0.36358052  1.80708422 -0.03503686  0.29128489  1.46047378 -1.36505525 -1.17728370
##  [8] -0.24217110 -0.13991465 -0.23580082
## 
## $sample2
##  [1]  2.3803664 -0.2600081  0.2600609 -0.4051472  0.6318090 -0.2644558 -1.1620556
##  [8] -0.8248872  0.3075912 -0.6632736
## 
## $sample3
##  [1] -0.20242455  0.98168256 -0.54272922 -0.22467325  2.26182333  0.11314960 -0.69103477
##  [8] -0.24620885 -0.04184039 -1.40774447

??? comments


##purrr: Functional Programming Tools

Use Case: Applying the z-normalization to multiple variables

map(
  .x = samples,
  .f = function(.x) {
    (.x - mean(.x, na.rm = T)) / sd(.x, na.rm = T)
  })
map(
  .x = samples,
  .f = ~(.x - mean(.x, na.rm = T)) / sd(.x, na.rm = T))

??? comments


##purrr: Functional Programming Tools

.center[ This is great right?!?!

] – .center[

Now let us look at some other practical use cases!]

??? src: https://tenor.com/view/the-office-finger-guns-right-on-steve-carell-michael-scott-gif-4724041


##purrr: Functional Programming Tools

Check the data types of my columns:

penguins %>%
  map_df(class)
## # A tibble: 1 x 8
##   species  island  bill_length_mm bill_depth_mm flipper_length_~ body_mass_g sex    year 
##   <chr>    <chr>   <chr>          <chr>         <chr>            <chr>       <chr>  <chr>
## 1 charact~ charac~ numeric        numeric       numeric          numeric     chara~ nume~

Check the number of missing values per column:

penguins %>%
  map_df(~sum(is.na(.)))
## # A tibble: 1 x 8
##   species island bill_length_mm bill_depth_mm flipper_length_mm body_mass_g   sex  year
##     <int>  <int>          <int>         <int>             <int>       <int> <int> <int>
## 1       0      0              2             2                 2           2    11     0

??? comments


##purrr: Functional Programming Tools

Check the number of distinct values per column:

penguins %>%
  map_df(n_distinct)
## # A tibble: 1 x 8
##   species island bill_length_mm bill_depth_mm flipper_length_mm body_mass_g   sex  year
##     <int>  <int>          <int>         <int>             <int>       <int> <int> <int>
## 1       3      3            165            81                56          95     3     3

??? comments


##purrr: Functional Programming Tools

Check the highest value in each subset of the data (e.g., largest flipper_length_mm per sex):

penguins %>%
  drop_na %>% 
  group_by(sex) %>%
  group_map(~slice_max(., flipper_length_mm, n = 1), .keep = T)
## [[1]]
## # A tibble: 1 x 8
##   species island bill_length_mm bill_depth_mm flipper_length_mm body_mass_g sex     year
##   <chr>   <chr>           <dbl>         <dbl>             <dbl>       <dbl> <chr>  <dbl>
## 1 Gentoo  Biscoe           46.9          14.6               222        4875 female  2009
## 
## [[2]]
## # A tibble: 1 x 8
##   species island bill_length_mm bill_depth_mm flipper_length_mm body_mass_g sex    year
##   <chr>   <chr>           <dbl>         <dbl>             <dbl>       <dbl> <chr> <dbl>
## 1 Gentoo  Biscoe           54.3          15.7               231        5650 male   2008

??? drop_na: because otherwise I would also have a subgroup of NA


##purrr: Functional Programming Tools

map() also comes in handy, if you like to produce a series of identical plots, each depicting a separate subset of the underlying data:

species <- penguins %>% distinct(species, year) %>% pull(species) #.x argument for map()
years <- penguins %>% distinct(species, year) %>% pull(year)      #.y argument for map()

penguin_plots <- map2(
  .x = species,
  .y = years,
  .f = ~{
    penguins %>%
      drop_na %>% 
      filter(species == .x, year == .y) %>% 
      ggplot() +
        geom_point(aes(x = bill_length_mm, y = body_mass_g)) +
        labs(title = glue::glue("Scatter Plot Bill Length vs. BMI ({.x}, {.y})"))
    })

??? comments


##purrr: Functional Programming Tools

.pull-left[

penguin_plots[[1]]

] .pull-right[

penguin_plots[[4]]

]

??? comments


##purrr: Functional Programming Tools

Finally, map() is really powerful in the context of modelling. In the following we fit a linear regression model for each species-island subset.

nested_penguins <- penguins %>% 
  drop_na %>% 
  group_by(species, island) %>% 
  nest
## # A tibble: 5 x 3
## # Groups:   species, island [5]
##   species   island    data              
##   <chr>     <chr>     <list>            
## 1 Adelie    Torgersen <tibble [47 x 6]> 
## 2 Adelie    Biscoe    <tibble [44 x 6]> 
## 3 Adelie    Dream     <tibble [55 x 6]> 
## 4 Gentoo    Biscoe    <tibble [119 x 6]>
## 5 Chinstrap Dream     <tibble [68 x 6]>

.pull-right[ .footnote[ Note: For accessing elements in a nested tibble you may use the pluck(). For example, for accessing the first tibble in the column data, you may run nested_penguins %>% pluck("data", 1).]]

??? comments


##purrr: Functional Programming Tools

nested_penguins <- nested_penguins %>% 
  mutate(lin_reg = map(.x = data, .f = ~lm(body_mass_g ~ ., data = .x))) 
## # A tibble: 5 x 4
## # Groups:   species, island [5]
##   species   island    data               lin_reg
##   <chr>     <chr>     <list>             <list> 
## 1 Adelie    Torgersen <tibble [47 x 6]>  <lm>   
## 2 Adelie    Biscoe    <tibble [44 x 6]>  <lm>   
## 3 Adelie    Dream     <tibble [55 x 6]>  <lm>   
## 4 Gentoo    Biscoe    <tibble [119 x 6]> <lm>   
## 5 Chinstrap Dream     <tibble [68 x 6]>  <lm>

??? comments


##purrr: Functional Programming Tools

nested_penguins %>% 
  mutate(coefs = map(lin_reg, ~summary(.x) %>% .$coefficients %>% as_tibble)) %>%
  unnest(coefs)
## # A tibble: 30 x 8
## # Groups:   species, island [5]
##   species island    data              lin_reg  Estimate `Std. Error` `t value` `Pr(>|t|)`
##   <chr>   <chr>     <list>            <list>      <dbl>        <dbl>     <dbl>      <dbl>
## 1 Adelie  Torgersen <tibble [47 x 6]> <lm>    449264.      130401.       3.45     0.00133
## 2 Adelie  Torgersen <tibble [47 x 6]> <lm>         4.20        17.3      0.243    0.809  
## 3 Adelie  Torgersen <tibble [47 x 6]> <lm>       -62.0         54.6     -1.14     0.263  
## 4 Adelie  Torgersen <tibble [47 x 6]> <lm>        15.5          8.74     1.77     0.0838 
## # ... with 26 more rows

.footnote[ Note: You may eventually want to drop the lin_reg and data, otherwise you carry around a lot of redundant data in your tibble which may exceed your memory storage capacity very quickly.]

??? there are packages for automatically doing this with just one line of code, see broom


##purrr: Functional Programming Tools

.pull-left[ .center[ How you may probably feel right now

]]

–

.pull-right[ .center[ How you do after mastering the intricacies of FP

]]

.footnote[ .pull-left[ For a great tutorial that help you master the notion of functional programming with R see this blog post by Rebecca Barter.]]